Scalable Levy Process Priors for Spectral Kernel Learning
نویسندگان
چکیده
Gaussian processes are rich distributions over functions, with generalization properties determined by a kernel function. When used for long-range extrapolation, predictions are particularly sensitive to the choice of kernel parameters. It is therefore critical to account for kernel uncertainty in our predictive distributions. We propose a distribution over kernels formed by modelling a spectral mixture density with a Lévy process. The resulting distribution has support for all stationary covariances—including the popular RBF, periodic, and Matérn kernels— combined with inductive biases which enable automatic and data efficient learning, long-range extrapolation, and state of the art predictive performance. The proposed model also presents an approach to spectral regularization, as the Lévy process introduces a sparsity-inducing prior over mixture components, allowing automatic selection over model order and pruning of extraneous components. We exploit the algebraic structure of the proposed process forO(n) training andO(1) predictions. We perform extrapolations having reasonable uncertainty estimates on several benchmarks, show that the proposed model can recover flexible ground truth covariances and that it is robust to errors in initialization.
منابع مشابه
Deep Kernel Learning
We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the nonparametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel represe...
متن کاملThe Deep Feed-Forward Gaussian Process: An Effective Generalization to Covariance Priors
We explore ways of applying a prior on the covariance matrix of a Gaussian Process (GP) in order to increase its expressive power. We show that two well-known covariance priors, Wishart Process and Inverse Wishart Process, boil down to a two-layer feed-forward network of GPs with a particular kernel function on the neuron at the output layer. Both of these models perform supervised manifold lea...
متن کاملMultiple Gaussian Process Models
We consider a Gaussian process formulation of the multiple kernel learning problem. The goal is to select the convex combination of kernel matrices that best explains the data and by doing so improve the generalisation on unseen data. Sparsity in the kernel weights is obtained by adopting a hierarchical Bayesian approach: Gaussian process priors are imposed over the latent functions and general...
متن کاملScalable Kernel Embedding of Latent Variable Models∗
Kernel embedding of distributions maps distributions to the reproducing kernel Hilbert space (RKHS) of a kernel function, such that subsequent manipulations of distributions can be achieved via RKHS distances, linear and multilinear transformations, and spectral analysis. This framework has led to simple and effective nonparametric algorithms in various machine learning problems, such as featur...
متن کاملCharacterizing the Function Space for Bayesian Kernel Models
Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator defined as the convolution of a kernel with a signed measure. Priors on the random signed measures correspond to prior distributions on the functions mapp...
متن کامل